-
Notifications
You must be signed in to change notification settings - Fork 16
Nick/create problem #442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nick/create problem #442
Conversation
@ntjohnson1 Thanks, what a major undertaking! This is a departure from the TTB for MATLAB implementation, in that 1) you need to choose a Problem class, add parameterization to that, and then instantiate a (data, solution) pair; and 2) missing data pararmeterization is handled outside of the Problem class in the function signature, leading to two inputs as opposed to one in TTB (e.g., passing Mapping to the TTB implementation, I now see the challenges, as there are so many options and combinations of options. So, I am OK with this for now. I will explicitly reach out to users to request feedback once this lands in its initial form. |
Open to feedback or a design discussion if people have thoughts. I had a little bit of consideration when I put this together but biased towards trying to make it more explicit/clear on expectations which might hurt usability a little. I'm sure there's at least some improvments for the ergonomics. |
@ntjohnson1 Everything seems to align with TTB for MATLAB, except the Sparse Problem creation. The number of nonzeros seems to be significantly lower in sz = [20 15 10];
nf = 4;
A = cell(3,1);
for n = 1:length(sz)
A{n} = rand(sz(n), nf);
for r = 1:nf
p = randperm(sz(n));
idx = p(1:round(.2*sz(n)));
A{n}(idx,r) = 10 * A{n}(idx,r);
end
end
S = ktensor(A);
S = normalize(S,'sort',1);
info = create_problem('Soln', S, 'Sparse_Generation', 0.5);
num_nonzeros = nnz(info.Data)
total_insertions = sum(info.Data.vals)
num_nonzeros =
718
total_insertions =
1500 shape = (20, 15, 10)
num_factors = 4
A = []
for n in range(len(shape)):
A.append(np.random.rand(shape[n], num_factors))
for r in range(num_factors):
p = np.random.permutation(np.arange(shape[n]))
idx = p[1 : round(0.2 * shape[n])]
A[n][idx, r] *= 10
S = ttb.ktensor(A)
S.normalize(sort=True);
existing_params = ExistingCPSolution(S, noise=0.0, sparse_generation=0.5)
solution, data = create_problem(existing_params)
print(
f"num_nozeros: {data.nnz}\n"
f"total_insertions: {np.sum(data.vals)}\n"
)
num_nozeros: 258
total_insertions: 1500.0 I am willing to merge for now and then add an Issue to address this specific discrepancy. What do you think of that approach? |
Sounds reasonable to me or we could just disable sparse generation for now. That setting is the only thing that routes through |
Thanks, I'll merge it and add a new Issue for |
…d in the example (as it is optional).
If you have more bandwidth to look at this it looks like I normalized the factor matrices incorrectly so the probabilities summed to more than 1 which caused the issue. With your sample script this yields much more reasonable results. (~700-740). But I'm also fine to take this over to the followup diff --git a/pyttb/create_problem.py b/pyttb/create_problem.py
index 36d1c97..f14d2a6 100644
--- a/pyttb/create_problem.py
+++ b/pyttb/create_problem.py
@@ -593,7 +593,7 @@ def generate_data_sparse(
# Convert solution to probability tensor
# NOTE: Make copy since normalize modifies in place
- P = solution.copy().normalize(mode=0)
+ P = solution.copy().normalize(normtype=1)
eta = np.sum(P.weights)
P.weights /= eta |
Got it, this makes sense. Thanks for the catch! I'll make the change. |
…or Sparse Problem.
It would be nice to land #432 first since that resolves all of our doc warnings. I didn't fully merge it here so if that merges first I probably need to review this to make sure I didn't break any new stuff. Alternatively if we want to land this first I can revert the few unrelated commits (they made local iteration cleaner to avoid whatever mac precision thing I'm hitting).
This resolves #312. I added tests but hard to say if things are totally correct. For the sparse generation example in the tutorial I never fully groked that before and didn't dig into it now but the results all seem to look reasonable. There will probably be follow on after this to actual move usages over to this implementation, related to #363. This ended up being a large task than expected so punting on that for now.
Style things:
📚 Documentation preview 📚: https://pyttb--442.org.readthedocs.build/en/442/